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Abstract 

The purpose of this paper is to study the dynamical behavior of 
the sequence produced by a forward-backward algorithm, involving two 
random maximal monotone operators and a sequence of decreasing step 
sizes. Defining a mean monotone operator as an Aumann integral, 
and assuming that the sum of the two mean operators is maximal 
(sufficient maximality conditions are provided), it is shown that with 
probability one, the interpolated process obtained from the iterates is 
an asymptotic pseudo trajectory in the sense of Benai’m and Hirsch of 
the differential inclusion involving the sum of the mean operators. The 
convergence of the empirical means of the iterates towards a zero of the 
sum of the mean operators is shown, as well as the convergence of the 
sequence itself to such a zero under a demipositivity assumption. These 
results find applications in a wide range of optimization problems or 
variational inequalities in random environments. 

Keywords : Dynamical systems, Random maximal monotone operators, 
Stochastic forward-backward algorithm, Stochastic proximal point algorithm. 
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1 Introduction 

In the fields of convex analysis and monotone operator theory, the forward- 
backward splitting algorithm [1, 2] is one of the most often studied tech¬ 
niques for iteratively finding a zero of a sum of two maximal monotone 
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operators. This problem finds applications in convex minimization prob¬ 
lems. Indeed, when each of the two maximal monotone operators coincides 
with the sub differential of a proper and lower semicontinuous convex func¬ 
tion, the forward-backward algorithm converges to a minimizer of the sum 
of the two functions, provided some conditions are met. Other applications 
include saddle point problems and variational inequalities. Each iteration 
of the algorithm involves a forward step, where one of the operators is used 
explicitly, followed a backward step that consists in applying the resolvent 
of the second operator to the output of the forward step. 

The purpose of this paper is to study a version of the forward-backward 
algorithm, where at each iteration, each of the two operators is replaced 
with an operator that has been randomly ehosen amongst a collection of 
maximal monotone operators. The sequence of random monotone operators 
is assumed to be independent and identically distributed (in a sense that 
will be made clear below), and the step size of the algorithm is supposed 
to approach zero as the number of iterations goes to infinity, in order to 
alleviate the noise effect due to the randomness. 

The aim is to study the dynamical behavior of the stochastic sequence 
generated by the above algorithm. Our main result states that the piece- 
wise linear interpolation of the output sequence is an asymptotic pseudo¬ 
trajectory (APT) [3, 4] of a certain semiflow, which we shall characterize 
below. Loosely speaking, it means that the iterates of our stochastic forward- 
backward algorithm asymptotically “shadow” the trajectory of a continuous 
time dynamical system, hence inheriting its convergence properties. In our 
case, the latter dynamical system is taken as a differential inclusion involv¬ 
ing the sum of the Aumann expectations of the randomly chosen maximal 
monotone operators [5, 6], as also introduced in the recent paper [7]. 

The convergence of the algorithm towards an element of the set of zeros 
of the sum of the Aumann expectations is of obvious interest. In this regard, 
the above APT property yields two important corollaries. Using a result of 
[8], we show that the sequence of empirical means of the iterates converges 
almost surely (a.s.) to a (random) element of the set of zeros. Moreover, 
when the sum of the Aumann expectations is assumed demipositive [9], we 
prove that the sequence of iterates converges a.s. to a zero. Verifiable 
conditions for demipositivity can be easily devised. 

This paper is organized as follows. Section 2 provides the theoretical 
background. Section 3 introduces the main algorithm and states the main 
results. Section 4 reviews some applications to convex minimization prob¬ 
lems. Related works are discussed in Section 5. Proofs are provided in 
Section 6. Perspectives and conclusions are addressed in Sections 7 and 8 
respectively. 
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2 Preliminaries 


2.1 Monotone Operators 

A set-valued operator A : ^ where N is some positive integer, is 

said to be monotone if V(x, y) G gr(A), V(x', y') G gr(A), {y — y',x — x') > 0, 
where gr(A) stands for the graph of A. A non-empty monotone operator 
is said to be maximal if its graph is a maximal element in the inclusion 
ordering. A typical maximal monotone operator is the subdifferential of a 
function belonging to Fq, the family of proper and lower semicontinuous 
convex functions on We use At to represent the set of maximal mono¬ 
tone operators on and let dom(A) := {x G : A(a:) ^ 0} be the 
domain of the operator A. 

Given that A, B G At, where B is assumed to be single-valued and where 
dom(B) = R^, the forward-backward algorithm reads 

x„+i = {I + 7 A)~^(x„ - 7B(xn)), (1) 

where I is the identity operator, 7 is a real positive step, and (•)“^ is the 
inverse operator defined by the fact that {x,y) G gr(A“^) {y,x) G gr(A) 

for an operator A. The operator (/ -|- yA)”^, called the resolvent, is single 
valued with the domain R^ since A G At [10, 11]. In the special case where 
A is equal to the subdifferential df of a function / G Tq, the resolvent is also 
refered to as the proximity operator, and we note proxj(a:) = {I + df)~^{x). 

We denote the set of zeros of A as Z{A) := {x G R^ : 0 G A(x)}. 
Assuming that B is so-called cocoercive, and that 7 satisfies a certain con¬ 
dition, the forward-backward algorithm is known to converge to an element 
of Z{A+ B), provided the latter set is not empty [11]. 

2.2 Set-Valued Functions and Set-Valued Integrals 

Let (H, IT, /i) be a probability space, where ^ is ;U-complete. Consider the 
space R'^ equipped with its Bor el field and let F : H ^ R^ be 

a set-valued function such that F{^) is a closed set for any ^ G H. The 
set-valued function F is said to be measurable if : F(^) n if 7 ^ 0} G IT 
for any set H G This is known to be equivalent to asserting that 

the domain dom(F) := {^ G H : F(^) 7 ^ 0} of F belongs to IT, and that 
there exists a sequence of measurable functions ^pn '■ dom(F) —> R'^ such 
that F(^) = cl({(/ 9 „(^)}) for all ^ G dom(F) [12, Chap. 3] [13]. Assume 
now that F is measurable and that //(dom(F)) = 1. For 1 < p < 00 , let 
lF,|U;R^) be the Banach space of measurable functions (/? : H —>■ R^ 
with f ||(/j||^d/r < 00 , and let 

S^:={cpe£P(S,^,fi;R^) : g F (0 A - a.e.} . ( 2 ) 
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If Sp 7 ^ 0, then the function F is said to be integrable. The Aumann integral 
[5, 6 ] of F is the set 



ipdg, ■. Lp ^ Sp 


2.3 Random Maximal Monotone Operators 


Consider the function d : H —)■ M. Note that the graph gr(d(^, •)) of any 
element •) is a closed subset of x by the maximality of d.(^, •) 
[10, Prop. 2.5]. Assume that the function ^ i—>■ gr(A(^, •)) is measurable as a 
closed set-valued H ^ x R-^ function. It is shown in [14, Ch. 2] that this 
is equivalent to saying that the function ^ (/-|- 7 A(^, is measurable 

from H to R^ for any 7 > 0 and any x € R^. If the domain of A(^, •) is 
represented by D{^), the measurability of i-)- gr(A(^, •)) implies that the 
set-valued function ^ 1 —>■ cl{D{^)) is measurable. Moreover, recalling that 
A{^,x) is the image of a given x G R^ under the operator A{^, •), the set¬ 
valued function ^ 1 —>• A{^,x) is measurable [14, Ch. 2]. Given x € D{^), 
the element of least norm in is denoted as Ao(^,x). In other words, 

= P'^ojA(^,a:)( 0 )- It known that the function ^ i-)> Ao(.^,x) is 
measurable [14, Ch. 2]. 

For any 7 > 0, the resolvent of A(|, •) is represented by 
J-y{C,x) := {I + -fA{^,-))-^{x). 

As we know, is a non-expansive function on R-^. Since J^{^,x) is 

measurable in ^ and continuous in x, Caratheodory’s theorem shows that the 
function : E x R-^ —>■ R^ is G .^(R^) measurable. We also introduce 
the Yosida approximation A^(^, •) of A(^, •), which is defined for any 7 > 0 
as the ^ ( 8 > .^(R^) measurable function 


The function is a 7 ^-Lipschitz continuous function that satisfies 

p 7 (?>®)ll t l|do(^,a:)|| and A^{i,x) Ao{C,x) for any x G T>(0 when 
7 4 , 0. Moreover, the inclusion A^{^,x) G A{^, J^{^, x)) holds true for all 
X G R^ [10, 11]. 

The essential interseetion T> of the domains D{^) is [15] 


u n 

E&3rp{E)=0 ^€B.\E 


in other words, x € V f-iiC ■ x G T>(^)}) = 1. Let us assume that 
P 7 ^ 0 and that this function is integrable for each x G P. On P, we define 
A as the Aumann integral 

■^{x) ■■= j^A{^,x)^x{di). 
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One can immediately see that the operator A : T> ^ so defined is a 
monotone operator. 

2.4 Evolution Equations and Almost Snre APT 

Given that A G At, consider the differential inclusion 

z{t) G —A{z{t)) a.e. on M+, z{0) = zq, (3) 

for a given zq in dom(A). It is known from [10, 16] that for any zq G 
dom(A), there exists a unique absolutely continuous function z : R_|_ —?■ 
satisfying (3) - referred to as the solution to (3). Consider the map 

'k : dom(A) x R+ —)■ dom(A), {zo,t) z{t), 

where z(t) is the solution to (3) with the initial value zq. Then, for any t >0, 
'!'(•, t) is a non-expansive map from dom(A) to dom(A) who can be extended 
by continuity to a non-expansive map from cl(dom(A)) to cl(dom(A)) that we 
still denote as '!'(•, t) [10, 16]. The function 4' so defined is a semiflow on the 
set cl(dom(A)) x M_|_, being a continuous function from cl(dom(A)) x M_|_ to 
cl(dom(A)), satisfying 4'(-,0) = I and ^{zo,t + s) = s), t) for every 

Zq G cl(dom(A)), t , s > 0. The set 7 (x) := {4'(x, t) : t > 0} is the orbit of x. 
Although orbits of 'k are not necessarily convergent in general, any solution 
to (3) converges to a zero of A (which is assumed to exist) whenever A is 
demipositive [9]. By demipositive, we mean that there exists w G Z(A) such 
that for every sequence {{un,Vn) G A) such that {un) converges to u and 
{vn} is bounded, 

{un — w, Vn) - > 0 rt G Z{A). 

n—^oo 

We now need to introduce some important notions associated with the 
semiflow 'k. A comprehensive treatment of the subject can be found in [3, 
17]. A set S C cl(dom(A)) is said to be invariant for the semiflow 'k if 
'k(S', t) = S for all t >0. Given that e > 0 and T > 0, a {e,T)-pseudo orbit 
from a point a to a point b in is a n-uple of partial orbits ({4'(yj,s) : 
s G [0, ti]})i=o,...,n-i such that > T for i = 0,..., n — 1, and 

\\yo - all < e, 

-Vi+iW < e, f = 0, ...,n-l, 

Vn = b. 

Let 5 be a compact and invariant set S for \k. If for every e > 0, T > 0 and 
every a,b & S, there is an (e,r)-pseudo orbit from a to b, then the set S is 
said to be Internally Chain Transitive (ICT). We shall say that a random 
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process v{t) on M_|_, who is valued in is an almost sure asymptotic 
pseudo trajectory [3, 4] for the differential inclusion (3) if 


se[o,r] 


sup ||^^(t + s)-^(projci(dom(A))(^^(i))>'S)ll --^0 a.s. 

crn'Tl ^ ^ 


for any T > 0. We note that in the APT definition provided in [3, 4], no 
projection is considered because the flow is defined in these references on the 
whole space. Projecting on cl(dom(A)) here does not alter the conclusions. 
Let L{v) := fj^^g cl(u([t, oo[)) be the limit set of the trajectory v{t), i.e., 
the set of the limits of the convergent subsequences v{tk) as tk ^ oo. It is 
important to note that if is bounded a.s., and if v is an almost 

sure APT for (3), then with probability one, the compact set L{v) is ICT 
for the semifiow T [3]. 

The authors of [8] establish a useful property of asymptotic pseudo tra¬ 
jectories pertaining to the asymptotic behavior of their empirical measures. 
We now consider that v : Q x M+ is a random process on the prob¬ 
ability space equipped with a filtration - As we know, 

V is said to be progressively measurable if for each t > 0, the restriction to 
Q X [0,t] of V is (8> .^([0,t])-measurable, where ^{[0,t]) is the Borel field 
over [0,t]. For t >0, the empirical measure •) of v is then the random 
probability measure, defined by the identity 



for any measurable function / : —>■ M_|_. We also note that a probability 

measure v on is said to be invariant for the semiflow T if 



for any t > 0 and any measurable function / : —)■ M_|_. 

Now, if V is progressively measurable and if it is an almost sure APT for 
the semifiow T, then on a probability one set, all of the accumulation points 
of the set ■)}t>o for the weak convergence of probability measures are 

invariant measures for T [8, Th. 1]. ^ 

3 Results 

3.1 Algorithm Description and Main Results 

Let i? : H —>■ A4 be a mapping such that, similarly to the mapping A intro¬ 
duced in Section 2.3, the function ^ i-)- •)) is measurable. Moreover, 

^The result is stated in [8] when u is a so-called weak APT. It turns out that any almost 
sure APT is a weak APT by Levy’s conditional form of Borel-Cantelli’s lemma. 


6 




we assume throughout the paper that dom(i?(^, •)) = for almost every 
^ G H. We also assume that for every x € R'^, B{-,x) is integrable, and 
we set B{x) := f B(^, x)fi(d^). Note that dom;B = R'^. Let (un)n€N* be 
an lid sequence of random variables from a probability space ( 0 ,=^,P) to 
(H, clF) having the distribution /i. Starting with some arbitrary xq € R'^, 
our purpose is to study the behavior of the iterates 

Xn+l = J^y^+l(Un+l,Xn - 7n+lb(Un+l,Xn)), (u € N), (4) 

where the positive sequence ( 7 n)nGN* belongs to \ and where 6 is a 
measurable map on (H x R^, ^ 0 ^(R^)) —>■ (R^,,^(R'^)) such that for 
every x € R^, b{.,x) G (2). A possible choice for b is b{^,x) = 

-So(^, x), which is 0 ,^^(R^)-measurable, as the limit as 7 | 0 of B-y{^, x). 
We define the affine interpolated process as 

x{t)\=Xn + ^^ - —{t-Tn) (5) 

7 n+l 

for every t G r„+i[, where = Xli 7fe- Consider the differential inclusion 
J z(t) € -{A +B){z{t)), Vt G R+a.e., , . 

U( 0 )=zo. 

If 71 +is maximal, then for any zq G "D, ( 6 ) has a unique solution, in which 
case, : cl(P) x R_|_ —)■ cl(P) will represent the semiflow associated to ( 6 ). 

Before stating our main result, we need to make a preliminary remark. A 
point X* is an element of Z = Z{A+B) if and only if there exists (/? G 5^^. 
and V' G such that f ipd^+f ijjd^, = 0. We will refer to a couple ((/ 9 , -0) 

of this type as a representation of the zero x*. Moreover, in Theorem 3.1 
below, we shall assume that there exists such a zero x* for which the above 
functions ip and ijj can be chosen in ^,^;R^), where p > 1 is some 

integer possibly strictly larger than one. We thus introduce the set of 2p- 
integrable representations 

'R,2p{x^) = |((p, V’) e ^ • J + J = b} • 

We let n((^,.) be the projection operator onto cl(i7(^)), and d{^, •) (respec¬ 
tively d{-)) be the distance function to D{^) (respectively to V). 

Theorem 3.1. Assume the following facts: 

1. The monotone operator A is maximal. 

2. There exists an integer p > 1 and a point x* G such that TZ 2 p{xAj A 

0 . 
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3. For any compact set K o/M^, there exists £ €]0,1] such that 

sup [ \\Ao{C,x)\\^+^ ia{dC) < oo. 
x&KfXD J 

Moreover, there exists yo such that 

j h{dO < oo. 

4 . There exists C > 0 such that for all x G 

J d{f„xfp,{di) > Cd{xf , 

and furthermore, jn+il^n —>■ 1- 

5. There exists C > 0 such that for any x G R'^ and any 7 > 0, 

^ j \\J-yi^,x)-U{^,x)ffj.{d^)<C{l + \\xfP), 
where the integer p is specified in 2. 

6 . There exists M : H —>■ R_|_ such that is fi-integrable, and for all 

x G R'^, ||6(^,x)|| < + ||x||). Moreover, there exists a constant 

C > 0 such that f \\b{f,, x)\\'^p,{df,) < C(1 + ||x|pP). 

Then, the monotone operator A + B is maximal. Moreover, with probability 
one, the continuous time process x{t) defined by (5) is bounded and is an 
APT of the differential inclusion (6). 

Let us now discuss our assumptions. Sufficient conditions for the max- 
imality of A are provided below in Sections 3.2 and 4.1. Assumption 2 is 
relatively weak and easy to check. If we set e = 1, then Assumption 3 can 
be replaced with the stronger condition stating that for any compact set K 
of R^, 

sup \\Ao{^,x)\\‘^ fi{d() < 00 . 

x&KnvJ 

For more insight on the above assumption, let us compare it with the stan¬ 
dard Robbins-Monro algorithm yn+i = Vn + ln+iH{ynAn+i), where H is 
some measurable function. In order to ensure the almost-sure boundedness 
of (yn), it is standard to assume that \\H{y,^)\\ < -|- ||y||) for every 

(y,^) and for some square-integrable r.v. M(f^) [18]. As far as our algorithm 
is concerned, a similar assumption is needed on the operator B, but on the 
other hand, no such assumption is needed on the operator A. Assumption 3 
is weaker. Otherwise stated, when a random operator is used through its 


resolvent, there is no need to require the “linear growth” condition often 
assumed in the stochastic approximation literature. 

Assumption 4 is quite weak, and is easy to illustrate in the case where 
/i is a finite sum of Dirac measures. Following [19], we say that a finite 
collection of closed and convex subsets (Ci,...,Cm) over some Euclidean 
space is linearly regular if there exists k > 0 such that for every x, 

m 

max d{x,Ci) > Kd{x,C), where C = (| C*, 

2=1...m ' ' 

2=1 

and where implicitely C 7 ^ 0. Sufficient conditions for a collection of sets to 
satisfy the above condition can be found in [19] and the references therein. 
Note that this condition implies the so-called strong conical hull intersection 
property Nc{x) = ^Ci{x) for every x G C, where Nc{x) is, as we recall, 
the normal cone to C at the point x. 

Let us finally discuss Assumption 5. As 7 —)■ 0, it is known that J-y{Cj x) 
converges to n(^,x) for every (^, x). Moreover, Assumption 5 provides a 
control on the convergence rate. The fourth moment of ||J.y(^,x) — n(.^,x)|| 
is assumed to vanish at the rate 7 ^ with a multiplicative factor of the order 
[|x||^^. The integer p can potentially be as large as needed, provided that one 
is able to hnd a zero x* satisfying Assumption 2. In the special case where 
.) coincides with the subdifferential of the convex function /(^, .), 
Assumption 5 holds under the sufficient condition that for almost every ^ 
and for every x G D(^), 

||5,/o(^,x)||<M'(e)(l + ||xr/2), (7) 

where dxfoi^, is the smallest norm element of the subdifferential of /(^, .) 
at point X, and where M'(^) is a positive r.v. with a hnite fourth moment. 
Indeed, in this case, the resolvent J^{^,x) coincides with prox.^^^^ )(a^)) and 

by [7], 

-||j^(^,x)-n(^,x)[| <2||5/o(^,n(^,x))||. 

7 

As a consequence. Assumption 5 stems from (7) and the non-expansiveness 

ofn(^,.). 

The results of Theorem 3.1 can first be used to study the convergence of 
the sequence (x„) of empirical means, defined by 

E n 

_ k=l IkXk 

^k=l 7fc 

Corollary 3.1. Let the assumptions in the statement of Theorem 3.1 hold 
true. Assume that for any x* G Z, the set 7^2 (x*) is not empty. Then, for 
any initial value xq, the sequenee (x„) of empirieal means converges almost 
surely as n —>■ 00 to a random variable U, whose support lies in Z. 
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Let us now consider the issue of the convergence of the sequence (xn) to a 
point of 2. Note that the conditions of Theorem 3.1 are generally insufficient 
to ensure that Xn converges. A counterexample is obtained by setting N = 2 
and taking A as a 7 r/ 2 -rotation matrix, 13 = 0 [20, Sec. 6 ]. However, the 
statement will be proved valid when A + H is assumed demipositive. We 
start by listing some known verifiable conditions ensuring that the maximal 
monotone operator A + B is demipositive: 

1. A + B = dG, where G € Tq has a minimum. 

2. A + B = I — T, where T is a non-expansive mapping having a fixed 
point. 

3. The interior of Z is not empty. 

4. Z / 0 and A + B is 3-monotone, i.e., for every triple {xi, yi) & A + B 
for i = 1,2, 3, it holds that Yl\=\{yi,Xi — Xi-i) > 0 by setting xq = X 3 . 

5. A + B is strongly monotone, i.e., {xi — X 2 ,yi — 2 / 2 ) > ajjxi — X 2 IP for 
some a > 0 and for all {xi,yi) and (x 2 , 2 / 2 ) va. A + B. 

6 . Z A ^ and ^H is cocoercive, i.e., {xi — X 2 ,yi — 2 / 2 ) > OiWui — 2 / 2 !^ 
for some a > 0 and for all {xi,yi) and (x 2 , 2 / 2 ) in .A -|- H. 

The above conditions can be found in [20]. Specifically, conditions 1-3 can 
be found in [9], while Condition 4 can be found in [ 21 ]. Conditions 5 and 6 
can be easily verified to lead to the demipositivity ol A + B. Condition 1 
is further discussed in Section 4.1 below. Condition 2 is satisfied ii Z A ^ 
and if for any the operator I — [A + B){^, •) is a non-expansive mapping. 
Condition 4 is satisfied if .2^ 7 ^ 0 and if all the operators {A -|- B){^, •) are 
3-monotone. The last two conditions are most often easily verihable. 

We now have: 

Corollary 3.2. Let the assumptions in the statement of Theorem 3.1 hold 
true. Assume in addition that the operator A + B is demipositive, and that 
for any x* E Z, the set TZ 2 {xA} is not empty. Then, for any initial value xq, 
there exists a random variable U, supported by Z, such that Xn ^ U almost 
surely as n —)■ 00 . 

We now address the important problem of the maximality of A. 

3.2 Mciximality of A 

By extending a well-known result on the maximality of the sum of two 
maximal monotone operators, it is obvious that A is maximal in the case 
where // is a finite sum of Dirac measures and where the interior of P is not 
empty [10, 11]. For more general measures y, we have the following result. 
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Proposition 3.1. Assume the following: 

1. The interior ofV is not empty, and there exists a closed ball in V sueh 

that ||^o(^)^)|| < -^(0 ® ball, and such that M(^) is 

fi-integrable. 

2. For any compact set K ofM.^, there exists e > 0 such that 

sup / \\Ao{C,x)\\^^^ ia{d^) < oo. 
x&Krv J 

Moreover, there exists yo such that 

j < oo. 

3. There exists C > 0 such that for any x G 

J d{f^,x)fj,{df^) > Cd{x). 

4 - J II'^7(C)3;) — n(,f, x)||//((i^) < 'yC{x), where C{x) is bounded on com¬ 
pact sets o/M^. 

Then, the monotone operator A is maximal. 

4 Application to Convex Optimization 

We start this section by briefly reproducing some known results related to 
the case where •) is the subdifferential of a proper, closed and convex 
function g{f,,-). 

4.1 Known Facts Abont the Aumann Integral of Snbdiffer- 
entials 

A function g : H X — oo,oo] is called a normal integrand [22] if the 

set-valued mapping ^ !->■ epig{(,,-) is closed-valued and measurable. Let us 
assume in addition that g{f,,-) is convex and proper for every 

Consider the case where A(^, •) = dg{f,, •). The mean operator A is given 
by^ A{x) = f dg(^, x)/u(d^). Under some general conditions stated in [24], 
the integral and the subdifferential can be exchanged in this expression. In 

^By [14, 23], the mapping T : H —>■ At, defined as T(^, •) = dg(^, •), is measurable in 
the sense of Section 2.3. 
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this case, A{x) = dG{x), where G{x) = f g(^, x) This integral is 

defined as the sum 


L 


{?:9K:a:)eK+} 


g{i,x)g{di) + 


'{? :9(?,3;)e]-oo,0[} 


g{i,x)g{di) +I{x) 


where 

^ I +oo> if/^({? : 5(^,3;) = oo}) > 0, 

^ \ 0, otherwise, 

and where the convention (+ 00 ) + (— 00 ) = +00 is used. The function G is 
a lower semi continuous and convex function if G{x) > —00 for all x [24]. 
Assuming in addition that G is proper, the identity A = dG ensures that 
the operator A is monotone, maximal, and demipositive, and that the zeros 
of A are the minimizers of G. 


4.2 A Constrained Optimization Problem 

Let (X, be a probability space. Let the functions / : X x 

] — 00 , oo[ and g : X x — 00 , oo[ be normal convex integrands. Here 

we assume that g is finite everywhere to simplify the presentation. However 
we note that the results can be extended to the case where g is allowed to 
take the value + 00 . Recall the optimization problem 

m 

minF{x) + G{x), C = P|Cj, (8) 

xec ' ' 

2=1 

where F{x) = f f{g,x)i'{dg), G{x) = f g{r], x)i'{dr]) and are 

closed and convex sets. Consider a measurable function V/ : X x R^ ^R 
such that for every r/ E X and x E R^, V f{g,x) is a subgradient of f{g, . ) 
at X. Let {vn)n be an iid sequence on X with probability distribution z^. 
Finally, let (/„) be an iid sequence on {0,1,... ,m} with distribution a* = 
P(Ii = i) > 0 for every i. We consider the iterates 

^ f - 7 n+lV/(u„+i,X„)), if In+1 = 0, 

I proL (Xn - 7 „+iV/(un+i,x„)), otherwise. 

V. ^n+1 

(9) 

We recall that dgQ{g,x) is the least norm element of the subdifferential of 
g{g, .) at X. Given H C R^, we use the notation \F[\ = sup{||u|| : v E H}. 

Corollary 4.1. We assume the following. Let p >1 be an integer. 

1. For every x E / 1/(7) x)\h'{dg) + f \g{g, x)\h'{dg) < oo. 

2. For any solution x* to Problem (8), there exists a measurable function 
M* : X ^ M+ such that f M^,{g)‘^i'{dg) < oo, and for all 7 E X, 

|d/( 7 ,x*)| + \dg{g,x^)\ < M*(r/). 

Moreover, there exists a solution x* for which f < 00 . 
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3. For any compact set K o/M^, there exists £ €]0,1] such that 

supE||(9c/o(©,a;)||^’^^ < oo. 
xeK 

Moreover, there exists yo ^ C such that E||9(7o(0,< oo. 

4 . The closed and convex sets Ci,... ,Cm are linearly regular, i.e., 

3k > 0,Vx € max dist(x,Cj) > k dist(x,C), 

where dist(x, S) denotes the distance of the point x to the set S. More¬ 
over, 7n/7n+l ^ 1- 

5. There exists M : X —)■ M such that f M{rj)‘^'Pv{drj) < 00 , and 

V( 7 , x) G X X ||V/( 7 , x)|| < M{ri){l \\x\\). 

6 . There exists c > 0 such that Vx € R^, f \\Vf{r],x)\\'^i'{dr]) < c(l + 

Then, the sequence (x„) given by (9) converges almost surely to a solution 
to Problem (8). 

5 Related Works 

The problem of minimizing an objective fnnction in a noisy environment 
has brought forth a very rich body of literature in the field of stochastic ap¬ 
proximation [17, 25]. In the framework of this paper, most of this literature 
examines the evolution of the projected stochastic gradient or subgradient 
algorithm, where the projection is made on a fixed constraining set. 

In the case where the constraining set has a complicated structure, an 
incremental minimization algorithm with random constraint updates has 
been proposed in [26], where a deterministic convex function / is mini¬ 
mized on a finite intersection of closed and convex constraining sets. The 
algorithm developed in [26] consists of a subgradient step over the objec¬ 
tive / followed by an update step towards a randomly chosen constraining 
set. Using the same principle, a distributed algorithm involving an addi¬ 
tional consensus step has been proposed in [27]. Random iterations involv¬ 
ing proximal and subgradient operators were considered in [28] and in [29]. 
In [29], the functions g{f,, .) are supposed to have a full domain, to satisfy 
\\g{f,,x) — 5(^1 y)II < T{\\x — y\\ -|- 1) for some constant L which does not 
depend on | and, finally, are such that f \\g{^,x)\\‘^p,{df,) < L{1 -|- ||x|p). In 
the present paper, such conditions are not needed. 

The algorithm (4) can also be used to solve a variational inequality 
problem. Let C = where Ci,... ,Cm are closed and convex sets in 
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. Consider the problem of finding x* € C that solves the variational 
inequality 

Vx G C, (F(x*), X — X*) > 0 , 

where F : is a monotone single-valued operator on R'^ [30, 31]. 

Since the projection on C is difficult, one can use the simple stochastic al¬ 
gorithm Xn+i = proJc^ {xn — ''^n+iF{xn)), where the random variables 
are distributed on the set {1,... ,m}. The variant where F is itself an ex¬ 
pectation can also be considered ie., F{x) = f f{^,x)fj,{d^). The work [30] 
addresses this context. In [30], it is assumed that F is strongly monotone and 
that the stochastic Lipschitz property J \\f{^,x)—f{^,y)\\‘^fi{d^) < C']|x—y|p 
holds, where C is a positive constant. In our work, the strong monotonicity 
of F is not needed, and the Lipschitz property is essentially replaced with 
the condition ||V/(^,x)|| < M(^)(l -|- ||x||), where V/(|,x) is a subgradi¬ 
ent of /(I,-) at X (for instance, the least norm one), and M(|) satisfies a 
moment condition. 

In the same vein as our paper, [32] considered a collection {A{i, •)}^i of 
N maximal monotone operators, and studied the iterations 

N 

yn+l G A{an+l{l),Xn) , Xn+1 = Y\iI+ln+lA{an+l{i)r))~^{Xn-'yn+iyn+l), 

i=2 

where ( 7 ^) G \(A, and where {an) is a sequence of permutations of the 
set {!,..., N}. The convergence of (x^) to a zero of ^ A{i, •) is established 
in [32]. In the recent paper [33], a relaxed version of Algorithm (1) is con¬ 
sidered, where B is cocoercive and where its output, as well as the output of 
the resolvent of A, are subjected to random errors. The convergence of the 
iterates to a zero of A -I- B is established under summability assumptions on 
these errors. 

Regarding the convergence rate analysis, let us mention [34, 35] which in¬ 
vestigate the performance of the algorithm Xn+i = pmx^^^^g{xn—'yn+iHn+i), 
where Fln+i is a noisy estimate of the gradient V/(xn). The same algorithm 
is addressed in [36], where the proximity operator is replaced by the resol¬ 
vent of a fixed maximal monotone operator, and Hn+i is replaced by a noisy 
version of a (single-valued) cocoercive operator evaluated at x„. The paper 
[37] addresses the statistical analysis of the empirical means of the estimates 
obtained from the random proximal point algorithm. 

This paper follows the line of thought of the recent paper [7], who studies 
the behavior of the random iterates Xn+i = Jn+i{un+i,Xn) in a Hilbert 
space, and establishes the convergence of the empirical means Xn towards 
a zero of the mean operator ^(x) = f A(^, x) /u(d^). In the present paper, 
the proximal point algorithm is replaced with the more general forward- 
backward algorithm. Thanks to the dynamic approach developed here, the 
convergences of both (x^) and (x^) are studied. 


14 


Finally, it is worth noting that apart from the APT of Benai’m and 
Hirsch [3], many authors have introduced alternative concepts to analyze 
the asymptotic behavior of perturbed solutions to evolution systems. An 
important one is the notion of almost-orbit of [38, 39], and [40], which has 
been shown to be useful to analyze certain perturbed solution to differential 
inclusions of the form (3). The almost-orbit property is however more de¬ 
manding than the APT property, and is in general harder to verify, although 
it can lead to finer convergence results. Fortunately, the concept of APT 
has been proven sufficient here to guarantee that the interpolated process 
x{t) almost surely inherits both the ergodic and non-ergodic convergence 
properties of the orbits of 


6 Proofs 

Let us start with the proof of Proposition 3.1 because it contains many 
elements of the proof of the main theorem. 


6.1 Proof of Proposition 3.1 

We recall that for any ^ € H and any 7 > 0, the Yosida approxima¬ 
tion •) is a single-valued 7 “^-Lipschitz monotone operator defined on 

. As a consequence, the operator M'^, given by A^{x) = 

f A-y(^, x)fj,(d^), is a single-valued, continuous, and monotone operator de¬ 
fined on As such, AA is maximal [10, Prop. 2.4]. Thus, given any 

y G , there exists x'^ G such that y = x"^ + A'^{x'^). We shall find a 
sequence 7 n —>• 0 such that —)■ x* G P with y — x* € Ax* . The maxi- 

mality of A then follows by Minty’s theorem [ 10 ]. 

Let zq and p be respectively the centre and the radius of the ball referred to 
in Assumption 1, and set 


«(0 


zo + P 






where the convention 0/0 = 0 is used. By the monotonicity of A-y{^, •), 
0 < Ju{^))) p{d^). 
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Writing C = f M(^)fj,(d^) < oo (see Assumption 1), we obtain 


j{x'^= {x'^,y) - ||a;'^|p, 
j{-u{i),A^{i,x"<)) y{d£,) = {zo,x^-y) - p J x'^)\\ p{d^) 


j < Ik^ll j Po(?,^^(OII/a((iO < C'l|a;^ll, 

J < C'dizoll +p)- 


Therefore 


PI ||A^K,i’)||MW)+l|i’f < lk’ll(ll!;ll+lkol|+C)+C(||z„||+p)+||z„|| llsl 


This shows that the sets {||x'>'||} and {J ||A^(^, x'>')|| p{d^)} are both bounded. 
Writing A^{^,x'^) = 7 “^(n(^,x'^) — Jy{^,x'^)) + 7 “^(x'’' — n(,f,x'^')), and us¬ 
ing Assumption 4, we obtain that the set { 7 ~^/||x'^ — n(^, x'^)||/r((i^)} is 
bounded. By Assumption 3, {<i(x'^)/ 7 } is bounded. Given x"^, let us choose 
x'^ €V such that Hx"^ —x'>'|| < 2d{x'^). By the boundedness of {||x'’'||}, there 
exists a compact set K C such that x'^ ^ K. Associating a positive 
number e to A' as in Assumption 2, we obtain 



which is bounded by a constant independent of 7 thanks to Assumption 2. 
Thus, the family of H —>• functions x'^)} is bounded in the Banach 

space £^■'■^(5, ^u; M.^). 

Let us take a sequence ( 7 „,x'>'") converging to (0,x*). Let us extract a 
subsequence (still denoted as (n)) from the sequence of indices (n), in such 
a way that x'^’^))n converges weakly in towards a function f{C)- 

By Mazur’s theorem, there exists a function J : N ^ N and a sequence of 
sets of weights {{ak,n,k = n...,J{n) : = l})n such 

that the sequence of functions (gniC) = Ylk=n ^k,nAy^.{^, x'^'^)) converges 
strongly to / in . Taking a further subsequence, we obtain the p- 
almost everywhere convergence of {gn) to /. 

Observe that x* G cl(T>) since d{x^^) —)• 0. Choose a sequence (zn) in T> 
that converges to x*, and for each n, let £„ = {^ G H : Zn G D{^)}. Then, 
on the probability one set T = n„r„, it holds that x* G cl{D{^)). On the 
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intersection of T and the set where gn f, set r]niO — ~ 

and write 


||r/n(OII < - J7„(C,x*)|| + ||J7„(C,x*) - x*||. 

Since •) is non-expansive and since x* G cl{D{^)), we have gniC) —0. 

Considering Assumption 2, we also have 

||r?n(OII < lk*ll + \\J'rniC,x'^") - J-y^iC,yo)\\ + ||J7„(C,2/o) -yo|| + WvoW 
< ||x*|| +sup||x^|| + 2 ||yo|| + ||Ao(C,yo)||, 

7 

when 7n < 1- By Assumption 2 and the dominated convergence theorem, 
we obtain that —>■ 0 in . With this in mind, 

/ f \£/(i+£) / f \ i/{i+£) 

and the left-hand side converges to zero. Consequently, the random variable 

J(n) 

e-n = '^ ak,n{J'rd^^x'^'^) - x*, 

k=n 

converges to zero in probability, hence in the ^-almost sure sense along a 
subsequence. Fix ^ in this new probability one set, choose arbitrarily a 
couple {u,v) € A(,f, •), and write 

J(n) 

A^n — ^ ^ '^7fe (?) Oik,nA^i, , x"^*^)). 

k=n 

It holds by the monotonicity of A(^, •) that A„ > 0. Writing 

J{n) 

Xn = {U- X*, V - 5n(0) + ^ ak,n{Vk,v) , 

k=n 

and making n oo, we obtain that {u—x*,v—f{^)) > 0. By the maximality 
of A(^, •), it holds that (x*, /(O) ^ A{^, •). 

To conclude, we have 

y='^ ak,nX^’^ + / gniOKdO, 

k=n 

Y2k=l «fc,nx'^'= -^n X* G V, and gn ^ f G Making re ^ oo, we 

obtain y — x* = f f{^) g{d^) G ^(x*), which is the desired result. □ 
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6.2 Proof of Theorem 3.1 


Noting that domil = and using Assumption 6 of Theorem 3.1, one can 
check that the assumptions of Proposition 3.1 are satisfied for B. The result 
is that B is maximal. Because B has a full domain and A is maximal, A+B\s 
maximal by [11, Corollary 24.4]. Thus, the first assertion of Theorem 3.1 is 
shown, and moreover, the differential inclusion (6) admits a unique solution, 
and the associated semiflow is well defined. 

Defining Yy{^,x) := A^{^,x — 'yb{^,x)), the iterates can be rewritten 
as 


Xn+l — 'yn+lb{Un+l^ Xn) Xn) 

— Xn Ifn+lh'yn+i (Xn) T '^n+lXjn+ii 

where we define 

h^{x) := jx) + b(^, x))fi(d ^), 

and 

dn+l ■— Ty„_|_i (^n+1 ? 3;^),) b(Un-i-l, Xn)-l-^nb(Un+lj Xn) , 

where denotes the expectation conditionally to the sub fi-field <7(ui ,..., Un) 
of ^ (we also write Eq = E). Consider the martingale 

n 

Mn := , 

k=l 

and let M{t) be the affine interpolated process, defined for any n € N and 
any t G [r„,rn+i[ as 

M{t) := Mn + Vn+lit - Tn) = - —(t - Tn). 

7n+l 

For any f > 0, let 

r(t) := max{A; >0 : < f}. 

Then, for any f > 0, we obtain 

x{Tn + t)- x{Tn) = - f ds + M(r„ + f) - M(r„) 

J 0 

= H{Tn + t) - H{Tn) + M(t„ + t) - M{Xn) , (10) 

where H{t) := (iCr(s)) ds. The idea of the proof is to establish that 

on a P-probability one set, the sequence (x(Tn + •))neN of continuous time 
processes is equicontinuous and bounded. The accumulation points for the 
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uniform convergence on a compact interval [0, T] (who are guaranteed to 
exist by the Arzela-Ascoli theorem) will be shown to have the form 

z{t) — 2 ;( 0 ) = — lim J ds L 

( 11 ) 

where the limit is taken over a subsequence. We then show that the sequence 
of “ X [0, T] M^JV functions ((^, s) ^ (^, a;^(r„+s)), a;^(r„+s)))n 

is bounded in the Banach space x [0,T],/i (g) A), where A is the 

Lebesgue measure on [0,T]. Analyzing the accumulation points and follow¬ 
ing an approach similar to the one used in the proof of Proposition 3.1, we 
prove that the limit in the right-hand side of (11) coincides with 

z{t) - z{Q) = - ds (^J^f^^\^,s)^J.{d^) + , 

where for almost every s G [0,T], f^°‘\-,s) and f^^\-,s) are integrable se¬ 
lections of A(-,s) and respectively. This shows that z satisfies the 

differential inclusion (6). Hence, almost surely, the accumulation points of 
the sequence of processes {x{Tn + ■))n£N are solutions to ( 6 ). Recalling that 
the latter defines a semiflow <h : cl(P) x R+ —>■ cl(P), it follows that the 
process x{t) is a.s. an APT of ( 6 ). 

Throughout the proof, C refers to a positive constant, that can change 
from line to line, but that remains independent of n. We use c, ci, etc. to 
denote random variables on 11 —> M_|_ that do not depend on n. For a fixed 
event cj G H, these will act as constants. 

Proposition 6.1. Let Assumptions 2 and 6 of Theorem 3.1 hold true. 
Then, 

1. The sequence (x„) is bounded almost surely and in T^(H,.^,P;M'^). 

EEnTn/ll^7n(?>a:n)||V(dO] < oo- 

3. The sequence (H^n — a^*||)n converges almost surely. 

Proof. Writing ||xn+i-x*|p = ||xn - 2(xn+i - - x*)-|-||xn+i- 

XniP, we obtain 

||Xn+i - X*|p = \\Xn - X*|p - 27„+i (tt„+i, X„), X^ - X*) 

‘^'yn+l{b{Un-{-l, Xn), Xji X*) -|-|| f^('Ufi-|_i, X^i) -|- (rtn,^!, X^) || . 

Thanks to Assumption 2, we can choose ip G 5^^ and fj G such 

that 0 = f((p + il))dp.. Writing u = Un+i, 7 = 7 n+i, H 7 = (n„+i,x„). 
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J-I = J-yr.+iiun+i^Xn -•^n+ih{un+i,Xn)), and b = b{un+i,Xn) for conciseness, 
and recalling that Ky = (x — 76 — J'y)/'y, we write 

(Ky, Xn - X*) = (Ky - ip{u), - X*) + 7(Ky - ip{u),Y^) + 7(y-y - b) 

+ ((/?(u),Xn - X*) 

> + 7(^7 - v{u),b) + {(p{u),Xn - X*), 

since Yy G A(ri, Jy) and 74 (^, •) is monotone. By the monotonicity of B{^, •), 
we also have {b, Xn — x*) > {'tp{u),Xn — x*). By expanding 7^||6 + Ky|p, we 
obtain altogether 

||Xn+l - X*||^ < \\Xn - X*|p - 7^ll^7lP + {^{u) ,Yy) + 2j^{ip{u),b) 

+ 7^||6||^ - 27(99(tt) + ip{u),Xn - X*) 

< ||x„ - x *||2 - 72(1 - p-^)\\Yyf + 72(1 + / 3 - 1 )|| 6||2 
+ ‘ 27 '^ I3\\^{u)f - 2-/{(p{u) + ^p{u),Xn - X*), (12) 

where we used the inequality |(a, 6 )| < (/ 3 / 2 )||a ||2 + || 6 || 2 /( 2 / 3 ), where /3 > 0 
is arbitrary. By Assumption 6 , 

Enll^lP < C{1 + ||x„|p) < 2(7(1 + ||x*||^ + \\Xn - X*|p) 

for some (other) constant C. Moreover Kn{(p{u) + ^p{u),Xn — x*) = 0. Thus, 

E„||x„+i - x*|p < (1 + Cj^^i)\\xn - x*|p 

-7n+l(l-r') I r7n7l(C,Xn)||V(rfe)+^^7n+l- 

Choose /3 > 1. Using the Robbins-Siegmund Lemma [41] along with ( 7 ^) G 
^ 2 , the conclusion follows. □ 

Remark 1. This proposition calls for some comments. In the standard 
forward-backward algorithm described in the introduction of this paper, the 
operators A and B are both deterministic, and B is a single-valued operator 
satisfying a so-called cocoercivity property. In these conditions, the itera¬ 
tion (I) belongs to the class of the so-called Krasnosel’skit-Mann iterations, 
provided the fixed step size 7 is chosen small enough [11]. A well known 
property of these iterations is that the sequence (x„) is Fejer monotone with 
respect to Z(A + B). Specifically, for all x* G .Z'(A -|- B), (||x„ — x*||) is de¬ 
creasing. In our situation, the forward operators B(^, •) are not required to 
be single-valued. On the other hand. Assumptions 2 and 6 are needed along 
with the fact that ( 7 ^) G 1“^. Instead of the Fejer monotonicity, we obtain 
the weaker result given by Proposition 6.1-3. 

The following lemma provides a moment control over the iterates x^. 
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Lemma 6.1. Let Assumptions 2 and 6 in the statement of Theorem 3.1 
hold true. Then, sup„E||a:„|p^ < oo. 

Proof. We shall establish the result by recurrence over p. Proposition 6.1 
shows that it holds for p = 1. Assume that it holds for p — 1. Using 
Assumption 2, choose cp € 5^^ ^ ^ and ^ ^ such that 0 = J ((p + 

'ijj)dp,. Inequality (12) shows that for some constant C > 0, 

\\Xn+l - < ||Xn - X*||^ - 2'^n+l{p{Un+l) +lf{Un+l),Xn “ xf) 

+ C-il^^{\\p{Un+l)\\^ + \\b{Un+l,Xn)\\^) ■ 
Raising both sides to the power p then taking their expectations, we obtain 
E||x„+1 - X*||2P < ^ _^C'=2(_2)fc3^2fe+fc3jdfcl,fc2,A:3) ^ (13) 

ki+k 2 +k 3 =p 

where we set for every k = (/ci, /c 2 , ^ 3 ), 


= E 


^*11 ^ (II II “1“ ||^('WnH-l? 


|2\fc2 


X {p{Un+l) + '4){Un+l),Xn “ xf)^^ 


We can make the following observations: 

• By choosing k 2 = k^ = 0, we observe that E||x„+i — is no greater 
than E||xn — plus some additional terms involving only smaller 

powers of ||xn — x*||. 


• The term corresponding to {ki, k 2 , k^) = {p—1, 0,1) is zero since Un+i 
and ct(ui, ... ,Un) are independent and 'Kn{p{un+i) + 'f>{un+i),Xn — 
X*) = 0. This implies that any term in the sum except E||xn — x*|p^ 
is multiplied by 7n+i, raised to a power greater than 2. 


• Consider the case {ki,k 2 ,kz) 7 ^ (p—1,0,1) and (A:i, A: 2 ,/cs) 7 ^ (p, 0,0). 
Using Jensen’s inequality and the inequality x^y^ < x^^^ + yk+t 
non-negative x, y, k and £, we get 


|rf|<E[|| 

< CE 


X \\p{Un+l) + V’(^in+l)|| 
X i\\piun+l)f^^ + \\biUn+l,Xn)f'^^) 


ki 


X(||p(u„+l)f3 + ||^(^^^^)||fe3) 


< CE I 
+ CE 




Xn - X„ 


|2fcl+fc3 


E 
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By conditioning on a{ui,... ,Un) and by using Assumption 6, we get 


E 




< CE 


- + ||x„f^2+fc3)l < C(E||x„ - + 1). 


Noting that 2ki + /ca < 2{p— 1), we get that E||xn — < C by 

the induction hypothesis. Since 2/c2 + < 2p and since p and ijj are 

2p-integrable selections, it follows that \T^\ < (7(1 + E||x„, — 

Note also that in the considered case, one has 2 k2 + k^ >2, which 
implies that all terms are multiplied by 

In conclusion, we obtain that 

E||Xn+l - < E(1 + (77^+i)||Xn - + (77^+1 


for some constant C > 0. Starting from n = 0 and iterating, we obtain that 
sup„E||xn - < oo. □ 


We now need to control the distances to V of the iterates Xn- Let us 
start with an easy technical result, whose proof is left to the reader. 

Lemma 6.2. For any e > 0, there exist (7(e) >0 and C'(e) > 0 such that 
for any veetors x, y E 

\\x+yf < (l+e)||x|p+(7(e)||y|p, and ||x+y||'^ < (l+e)||xf+(7'(e)||y||'^. 

Proposition 6.2. Let Assumptions 2, 4, 5, and 6 of Theorem 3.1 hold 
true. Then, d{xn) tends a.s. to zero. Moreover, for every u in a probability 
one set, there exists c{uj) > 0 and a positive sequence (cm(a;))meN converging 
to zero such that for every integer n and every integer m such that n > m, 

< Cm(w)+c(w) 7fc. 

tL 


Proof We start by writing Xn+i = n(u„+i,Xn) + 7„+i(5n+i, where 


bn+l — 


('Ufi-i-i, X}^ 'yji+ib{un+i, Xji)) n('U}2-|i, X}^) 
7n+l 


Upon noting that .) is non-expansive for every 


||^n+l|| ^ II^("an+l) ®n)II 4“ 


I ('Ufi-l-l, X)^) n('Ujj-|i, Xfi 

7n+l 


Using Assumptions 5 and 6, we have 

IEn||<5n+l||^ = 4 j ||5(^,X„)fy(dO +47n+l J ll'^7n+l(^>^n) -n(^,Xn)||'^Ai(c^O 

<C(l + ||x.r). 
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Therefore, by Proposition 6.1-1., there exists a non-negative ci(a;), which is 
a.s. hnite and satisfies E„||(5„+i||^ < ci{uj) almost surely. By Lemma 6.1, it 
also holds that sup„E||5„||^ < oo. 

Consider an arbitrary point u € cl{'D). For any e > 0, by Lemma 6.2, we 
have 

||Xn+l - u\\^ < (1 -|-e)||n(tt„+i,x„) - uf -h7^+iC'||(5n+i||^. 

Since n(tt„+i, •) is firmly non-expansive as the projector onto a closed and 
convex set, we have 

||n('Ufi-|-i, Xn) 'ujl ^ \\Xn ||n(^n-|-l) Xn') Xn\\ ■ 

Taking u = 11 (x„), we obtain 

d(xn+i)^ < ||a:„+i -n(xn)|p 

< (1 +e){d{xnf - d{un+i,Xnf) + ||<5„+i|p. 

Taking the conditional expectation E„ at both sides of this inequality, us¬ 
ing Assumption 4 and choosing e small enough, we obtain the inequality 
Encf{xn+i) < p(f{xn) + 7^+iC'E„||(5n+i|p, where p € [0,1[. It implies that 
(jP‘{xn) tends to zero by the Robbins-Siegmund Theorem [41]. Moreover, 
setting = d{xnY/'^n using the fact that ^n/ln+i —>■ Ij we obtain 
that 

EnA^_|_l < P^n T 7n,-l-l^®^n.||'5n-|-l II 

for n larger than some hq. 

By Lemma 6.2 and the hrm non-expansiveness of n(u„_|_i, •), we also have 
||x„+i -u||^ < (1-h e)||n(u„+i,Xn) - m||^- by^+iCpn+ill"^ 

< (1 -he)(||Xn - u|p - ||n(u„+i,Xn) - Xnff +-i^^iC\\5n+l\\^ ■ 

(14) 

We also set u = n(xn) and apply the operator E„ at both sides of this 
inequality. By Assumption 4, we have 

J {d.{xf - d{i, xffp{di) = d{xf + J d{(, xfp{di) - 2d{x)'^ J d{C, xfp{di) 

< d{x)^ — d{x)‘^ J d{^,x)‘^p{d^) < (1 — C)d{x)^ 

since d{^,x) < d{x). Integrating (14), we obtain 

End^{Xn+l) < pd^{Xn) -f ||(5^+11|^ , 

where p S [0,1[, hence E„A^_,_]^ < pA^ -|- 7 ^C'E„||(i„_|_i ||^ for n larger than 
some no- Taking the expectation at each side, iterating, and using the 
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boundedness of (E||(5,i||^), we obtain that EA^ < C{p^ + X]fc=i7fcP" ^)- 
Therefore, 

OO OD 

^EA2 <c(l + 5^7n) <^- 

n=0 n=0 

Consequently, A„ —>■ 0 almost surely. Moreover, the martingale 

n 

Yn = - Efc_iAfc) 

k=l 


converges almost surely and in £^(0, P; M). Letting = Yl'k=m+i ^k-, 
where m and n are any two integers such that 0 < m < n, we can write 

n 

D^= Y, Efc_iAfc + y„-y^ 

k=ra+l 

n—1 

A p 'Yi ^^k + C'Tfc+lIEfcll'^fc+llP) + Yn — Ym 

k=m 

n 

7 P^m + pDm + PC\/ Cl(<^) "Y^ Ik + Yn — Y^a- 

k=m+l 

To conclude, we have 


^ 

I-p 


A, 


^ Yn Ym ^ pCC\ (w) 


I- p 


I- p 


Y 

k=m-\-l 


Since A^ —>■ 0, and since (y„(a;))„gN is almost surely a Cauchy sequence, 
we obtain the desired result. □ 


Lemma 6.3. Let Assumptions 3 and 6 hold true. For any compact set K, 
there exists a constant C > 0 and e €]0,1] such that for all x ^ K and all 
7 > 0, 

\\h,{x)\\<C + 2^, 

7 

and moreover, 






1 + 


d{x) 

7 


l+£ 


Proof. Set x € K, and introduce some x G V such that ||x — ai|| < 2d{x). 
Relying on the fact that .) is ^-Lipschitz continuous, 

||yy(C,a;)ll < IIA(?,®)II + -\\x - 7 b{^,x) - x\\ 

7 

<||7lo(^,x)|| + ||6(^,x)||+2^. 
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Therefore 


\\h^{x)\\< I \\Ao{^,x)Md^) + 2 I \\b{^,x)MdO+2^. 

The first two terms are independent of 7 and, by Assnmptions 3 and 6 , are 
bounded functions of x on the compact K. This proves the first statement 
of the Lemma. Let e = £{K) be the exponent dehned in Assumption 3. 
There exists a constant C such that 

<C(||K,.({,i)||‘+' + ||l,K,i)||‘+') 

< c((ii>i„({,i)ii + m,x)\\ + 2^)'*' + iii>K.i)ii‘+') 

By Assumption 6 and since / || 6 (^, < 1+/|| 6 (^, x)|p/i(d^), there 

exists some (other) constant C such that 

l(\\Y,{(,x)f + 

<c[j M„({,i)ii‘+v(<ie) +1 + iiiit + 

The proof is concluded using Assumption 3. □ 


End of the Proof of Theorem 3.1 


Recall (10). Given an arbitrary real number T > 0, we shall study the 
asymptotic behavior of the family of functions {x(rn + -)}„gN on the compact 
interval [0,T]. 

Given <5 > 0, we have \\H{t + (5) - H{t)\\ < ||/i.^^(^^^j(x^( 5 ))||ds. By 

Proposition 6.1-1, the sequence (xn) is bounded a.s. Thus, by Lemma 6.3, 
there exists a constant ci = ci(a;) such that for almost every cn. 


rt-\-S 

\\H{t + S) - H{t)\\ <ciS + 2 


ds 


rt+S 


< ci6 -I- 


pt+5 

= (ci -f 1)(5 J 


7r(s)+l 

^{^r{s )) 
0^r(s)+l , 
d(Xj.^^s )) 


1 + 


ds 


^>2 

h(s)+l 


■ds 


< (ci + C 2 + 1)(5 -I- e{t) 
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for some e(t) 0, where the last inequality is due to Proposition 6.2. 

We also observe from Proposition 6.1 and Assumption 6 that is a mar¬ 
tingale in P; M^), that 

CO « oo « 

k=i k=i 

and that the right-hand side is finite. Hence, Mn converges almost surely. 
Therefore, on a probability one set, the family of continuous time processes 
{M{Tn+ )—M{Tn))n&i Converges to zero uniformly on M+. The consequence 
of these observations is that on a probability one set, the family of processes 
{zni ■ )}neN) where Zn(t) = x{Tn + t), is equicontinuous. Specihcally, for each 
e > 0, there exists <5 > 0 such that 

limsup sup \\znit) — Zn{s)\\ < s. 
n 0<t,s<T,\t-s\<6 

This family is moreover bounded by Proposition 6.1-1. By the Arzela-Ascoli 
theorem, it has an accumulation point for the uniform convergence on [0, T], 
for an arbitrary T > 0. Prom any sequence of integers, we can extract a 
subsequence (which we still denote as (zn) with slight abuse), and a con¬ 
tinuous function z(-) on [0, T], such that {zn) converges to z uniformly on 
[0,T]. Hence, for t € [0,T], 

Z{t) - Z(0) = + i (^r(r„+.)) ds 

J 0 

= - lim f ds [ ix{di) {i, s) + (^, s)), 

Jo Js 

where we set := and g'^n\i,t) := 

Define the mapping gn := (g^^ , gn ^) on Ex [0,T] —)• Recalling that the 

sequence (x„) belongs to a compact set, say K, let e g] 0, 1] be the exponent 
defined in Lemma 6.3. By the same Lemma, 


[ ds [ g(dC) < c 

[r+ n 

diXr(Tn.+s))\ 

Jo Js 

L Jo 

'yr{Tn + s) + l ^ 

< c 

1—e 

T + T 2 

(r 



Jo 7r(r„-|-s) + l 


< Cl 


for some constants c and ci. Therefore, the sequence of functions (gn) 
is bounded in x [0,T], (g) .^([0, T]),/r (g) A;M^^), where A is the 

Lebesgue measure on [0, T]. The statement extends to the sequence of func¬ 
tions 

(0„K,«) = (9n({.*).llA“’K.<)ll.ll9i'’’({.*)ll))„. 
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which is uniformly bounded in x [0,T], ^ (8>^([0,T]),/i(g) 

We can extract from this sequence a subsequence that converges weakly in 
this Banach space to a function F : H x [0,T] —>■ We decompose F 

as F{^,t) = where k,v are real-valued, and where 

= (/(“)(?, t),/('He, i)) with /(“),/(') : H X [0,r] ^ R^. Using the 
weak convergence {g^\gn'^) we obtain 

zit) - 2;(0) = - s)n{d() + s)g,{di'^ . 

It remains to prove that for almost every t G [0,T], G A{.,z{t)) 

and G B{.,z{t)) /r-almost everywhere, along with 2;(0) G c\{V). 

This shows that indeed z{t) = $( 2 ( 0 ),t) for every t G [0,T], and it follows 
that x{t) is a.s. an APT of the differential inclusion (6). 

By Mazur’s theorem, there exists a function J : N ^ N and a sequence 
of sets of weights {{ak,n, k = n...,J{n) : ak,n > 0, ES ^k,n = l})n such 
that the sequence of functions defined by 

J(n) 

Gn{i,s) = ak,n Gk{i,s) 

k=n 

converges strongly to F. In the same way, we define gn{C, s) := Yhk ^k,n dkiC, s), 
and similarly for g^^ , gn'^ . Extracting a further subsequence, we obtain the 
/i 0 A-almost everywhere convergence of Gn to F. By Fubini’s theorem, for 
almost every t G [0,T], there exists a ^-negligible set such that for every ^ 
outside this set, Gn{i,t) From now on to the end of this proof, 

we fix such a t G [0,T]. 

As d{xn) —>■ 0, z{t) G cl(T’) (this holds in particular when t = 0, hence 
z(0) G cl(P)). Following the same arguments as in the proof of Proposi¬ 
tion 3.1, it holds that z{t) G cl(iA(^)) for all ^ outside a /r-negligible set. 

Define r/n(0 := - 'ym+MC, Xm)) - zit) +-frn+iKC, Xm) with 

1 x 1 = r[Tn -|- t). Using the same approach as in the proof of Proposition 3.1, 
it can be shown that, as n —>■ 00 , r/n( ■) tends to zero almost surely along 
a subsequence. We now consider an arbitrary ^ outside a /r-negligible set, 
such that gnii) 0 and zit) G cl(iA(^)). 

Let (u, u) be an arbitrary element of A(^, •). By the monotonicity of 

(u — Uy(.^, x),u — J-yiC, X — gbi^, x))) > 0 (Vx G R^, 7 > 0), 
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and we obtain 


J(n) 

{V - Z{t)) = '^ak,n ('I' - - 2;(i)) 

k=n 

J(n) 

— ^ ^ Q^/c,ti 9k 9k{Ci^^ Tr(Tfc+i) + l^('^’^r(rfc+t))) 

k=n 

J{n) 

^11 + '^oik,n\\gt\i,t 

k=n 

The term enclosed in the first parenthesis of the above right-hand side con¬ 
verges to Hull + K{^,t), while the supremum converges to zero using Assump¬ 
tion 6. As gl^^\^,t) —>■ it follows that 

{v - z{t)) >0, 

and by the maximality of A(^, •), it holds that E A{^,z{t)). The 

proof that E B{^,z{t)) follows the same lines. □ 

6.3 Proof of Corollary 3.1 

The proof is based on the study of the family of empirical measures of a 
process close to x{t). Using [8], we show that any accumulation point of this 
family is an invariant measure for the flow The corollary is then obtained 
by showing that the mean of such an invariant measure belongs to Z. 

Let Xn = n(x,i) be the projection of on cl(P), and write 

k=lIkXk 
/^k=l Ik 

Let x{u},t) be the Q x R_|_ —>• process obtained from the piecewise 
constant interpolation of the sequence (xn), namely x{uj,t) = Xn for t E 
[Tn,Tn+i[- On let {^t) be the filtration generated by the process 

obtained from the similar piecewise constant interpolation of (u„). With re¬ 
gard to this filtration, x is progressively measurable. It is moreover obvious 
that x{u},-) is an APT for (6) for almost all values of oj. Let {vtiuj, ■)}t>o 
be the family of empirical measures of x(uj,-). Observe from Theorem 3.1 
that for almost all uj, there is a compact set K(uj) such that the support 
supp(t't(a;, •)) is included in K(uj) for all t > 0, which shows that the family 
')}t>o is tight. Hence this family has accumulation points. Let u be 
the weak limit of along some sequence (tn) of times. By [8, Th. 1], u 
is invariant for the flow <b. Clearly, supp(j/) is a compact subset of cl(P). 
Moreover, for any x E supp(z^) and any t > 0, ^{x,t) E supp(z^). Indeed, 
suppose for the sake of contradiction that there exists to > 0 such that 


)ll) sup (||r?fc(CT)ll+7r(rfc+t)-Hll|fe(^,a;r(r;,-Ht))ll) • 
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<I>(x,to) 0 supp(i/). Then, ^{B{x,e) ncl(I?),to) C supp(i/)'^ for some e > 0 
by the continuity of <h and the closedness of supp(i^), where B{x,e) is the 
closed ball with centre x and radius e. Since e)ncl(P), 0)) > 0, we 

obtain a contradiction. We also know from [42] or [20, Th. 5.3] that there 
exists if : cl(X>) —>■ Z such that 



By the dominated convergence and Fubini’s theorems, we now have 



which shows that f xv{dx) G by the convexity of this set. Since we have 
f xdvt„ —>■ f X diy as n —h oo, we conclude that all the accumulation points 
of (xn) belong to Z. On the other hand, since 7Z2p(x^,) / 0 for each x* € Z, 
a straightforward inspection of the proof of Proposition 6.1-3. shows that 
(||x„ — x*||) converges almost surely for each x* G Z. Prom these two facts, 
we obtain by [32] or [20, Lm 4.2] that (xn) converges a.s. to a point of Z. 
Since x„ — —)• 0 a.s., the convergence of (xn) to the same point follows. 


□ 


6.4 Proof of Corollary 3.2 

Let us start with a preliminary lemma. 

Lemma 6.4. Let IK ^ Ad he demipositive. Assume that the set zer(A) of 
zeros of A is not empty. Let T : cl(dom(A)) x M+ —>■ cl(dom(A)) he the 
semiflow associated to the differential inclusion z{t) G —A{z{t)). Then, any 
ICT set of di is included in zer(A). 

Proof. Let K be an ICT set and let U be an arbitrary, bounded and open 
set of such that iP fl t/ / 0. Define Gt := Us>t S'!! ^ > 0. For 

any x* G zer(A) and any x G P, 

||'I'(x, f)|| < ]|'I'(x, t) — 4'(x*, f)]| -b ||x*|| < ||x — x*|| -b ||x*|| . 

Therefore, Gq is a bounded set. By [4, Prop. 3.10], the set G = nj>Qcl(Gt) 
is an attractor for 'L with a fundamental neighbourhood U. As K DU 
0, it follows that K C G hy [17, Corollary 5.4]. We finally check that 
G C zer(A). Let y G G, that is, y = limfc_^oo for some sequence 

{xkflk) such that Xk G U and tk —>■ oo. By compactness of cl{U), the 
sequence x^ can be chosen such that x^ —>■ x for some x G cl{U). Therefore, 
y = limfc^oo'k(x, tfc), which by demipositivity of A, implies y G zer(A) 


[9, 20]. 


□ 
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By Theorem 3.1 and the discussion of Section 2.4, L{x) is an ICT set. 
Using Lemma 6.4 and the standing hypotheses, L{x) C Z. On the other 
hand, since 7^2(a^*) 7 ^ 0 for all x* € .Z, a straightforward inspection of the 
proof of Proposition 6.1-3. shows that \\xn — x*|| converges almost surely for 
any of those x*. By Opial’s lemma [20, Lm 4.1], we obtain the almost sure 
convergence of (x„) to a point of Z. □ 


6.5 Proof of Corollary 4.1 

Define the probability distribution Q := YllLo {Oj 1) • • •) "^}- 

On the space X x {0,... , m} equipped with the probability /r = (8* C; 

let ^ and define the random operators A and B by 




•), ifi = 0, 

Nc^■, otherwise. 


and ■= dxfivr)- 


The Aumann integral i3(x) = f df{r],x)d7r{r]) coincides with dF{x) by [43] 
(see also the discussion in Section 4.1). Similarly, Xl(x) = d{G{x) + lc){x)- 
The operator A is thus maximal. It holds that A + B = d{F + G + lc)j which 
is maximal, demipositive, and whose zeros coincide with the minimizers of 
F + G over C. The end of the proof consists in checking the assumptions of 
Corollary 3.2. It follows the same line as [7] and is left to the reader. □ 


7 Perspectives 

Beyond the forward-backward algorithm, the concept of random maximal 
monotone operators can be used to study stochastic versions of other popular 
optimization algorithms that rely on the monotone operator theory. Our 
next research direction is therefore to extend our approach to other kinds of 
algorithms, such as the Douglas-Rachford algorithm, as a way to construct 
new families of stochastic approximation algorithms. In this perspective, 
the present paper may contain useful ingredients. 

It would also be interesting to weaken the assumption that the “inno¬ 
vation” (un) is an iid sequence. More involved random models are often 
useful. Among those are the ones where the innovation is a Markov chain 
controlled by the iterates. Such models are popular in the classical stochastic 
approximation literature. 

Another research direction includes the case where the step size of the 
algorithm is constant. In this context, the APT property does not hold and 
the iterates are no longer expected to converge a.s., due to the persistence 
of the random effects. Tools from the weak convergence theory of stochastic 
processes can be useful to address this setting. 

Finally, we believe that our algorithm can be shown to be useful to 
address several specific applications in the field of convex optimization and 
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variational inequalities. An important aspect is to instanciate the algorithm 
in practical scenarios related to machine learning, signal processing, or game 
theory. 


8 Conclusions 

The question of providing stochastic versions of well-known deterministic 
algorithms relying on maximal monotone operators has become increasingly 
popular. In particular, several authors have studied the effects of additive 
random errors on the behavior of the iterates, showing that the errors have 
no effect on the limiting points, provided some adequate vanishing condition 
of the former. The approach taken by this paper is conceptually different 
in the sense that the operators themselves are assumed to be random. This 
situation involves two key-ingredients. The first one is the Aumann expec¬ 
tation of the random operators. The second one is the notion of asymptotic 
pseudotrajectory, borrowed from Bena’im and Hirsch, which is used to relate 
the iterates to a continuous-time dynamical system. 
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